dump temporary security contexts to file in specified shared temporary directory for use in worker submit scripts#524
Conversation
|
@guillaumeeb: due to your concerns, I extracted this one from #519 to look at it separately. I guess the biggest question is how to best create temporary files. One maybe better way may be to create a whole directory together with the job script, which also needs to live somewhere. One could also try to somehow inject this in the scheduler scripts. E.g. if a bash wrapper is used one could use process substitution . But this seems to be a bit complicated to me. I stuck with this one as it is rather minimal. |
That's a good idea! And I agree with the main concern. I'll add a review. |
guillaumeeb
left a comment
There was a problem hiding this comment.
Thanks again for the work.
@jacobtomlinson we could really use your eyes on this one. Does this file generation looks good to you, and security compliant?
in #496 you said:
I think only supporting file certificates in dask-jobqueue would be reasonable. In places like dask-cloudprovider we are limited to passing the certs in the Dask config because we can't guarantee there will be a shared filesystem. But for HPC I think most cases there will be one.
Did you mean by generating a temporary file? And do you think that in-memory context would be feasible for dask-jobqueue by serialization?
there is also some more information in #520.
|
Forget about the deprecation: accidentally looked at the wrong file. Thought tmpfile was finally deprecated as I remembered that there was a discussion in dask#7800 that it was not needed anymore due to the new python tmpfile, which I misunderstood So I guess it makes sense then also to use the built-in tmpfile in this PR. I cannot remember why I didn't do that. |
|
I think #532 should be completed/merged first. Without any testing this is far to shaky... |
|
Moved test across all clusters here: #537 (will fail for current implementation) |
8cd6a59 to
c7174bb
Compare
|
Now, it basically works, the only problem is identifying a writable shared directory for testing. Local still works, htcondor and slurm xfail due to the missing shared filesystem, sge works, however, pbs seems has the permissions on the test directory set as non-writable: what would a better place to write the temporary files? Second question: Does it make sense to default to the home dir as a shared directory? I guess most clusters work like this, however e.g. the CI does not... |
|
BTW: the last remaining error shows that the permissions on the temporary certificate files are set in a secure manner to 0600 . |
…i user * added context handler to Job Cluster in core.py * make source dir owned by pbsuser * dynamically select port to fix sporadic port already in use errors
d7ade92 to
86977af
Compare
|
I thought I was done: now I get an error in the unmodified and I cannot reproduce the error locally. It seems however, the error is related to previously running tests that did not clean up correctly (https://github.com/riedel/dask-jobqueue/runs/4346772070) |
730cf43 to
1fdbe1d
Compare
1fdbe1d to
bf0c957
Compare
riedel
left a comment
There was a problem hiding this comment.
added a few explainations.
guillaumeeb
left a comment
There was a problem hiding this comment.
We need to find an agreement on the configuration name and the default (even that is looks every one agrees for defaulting the config option to None/null).
because it xfails anyways and the problem with the ci is unrelated
tests_required is deprecated but kept as in distributed
guillaumeeb
left a comment
There was a problem hiding this comment.
Just one last suggestion, and then I think we can merge!!
Co-authored-by: Guillaume Eynard-Bontemps <g.eynard.bontemps@gmail.com>
…ask-jobqueue into in_memory_security_contexts_520
d70544e to
18da416
Compare
|
Since #527 (which is the final thing I want to get merged) passes I am happy now. |
|
Thanks a lot for all the work here @riedel, I know this was long and with a lot of waiting periods. |
|
Thank you alot, @guillaumeeb! I am just happy that this part is in now and encrypting traffic now easily works. Also @andersy005 for reviewing. |
Fixes #520
Closes #534